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Channel Estimation for Fading Channels 

A. Taufiq Asyhari, Tobias Koch and Albert Guillen i Fabregas 



O ■ Abstract 



We study the information rates of non-coherent, stationary, Gaussian, multiple-input multiple-output 
(MIMO) flat-fading channels that are achievable with nearest neighbor decoding and pilot-aided channel 
estimation. In particular, we investigate the behavior of these achievable rates in the limit as the signal- 
' to-noise ratio (SNR) tends to infinity by analyzing the capacity pre-log, which is defined as the limiting 

ratio of the capacity to the logarithm of the SNR as the SNR tends to infinity. We demonstrate that a 
scheme estimating the channel using pilot symbols and detecting the message using nearest neighbor 
decoding (while assuming that the channel estimation is perfect) essentially achieves the capacity pre-log 
of non-coherent multiple-input single-output flat-fading channels, and it essentially achieves the best so 



> 

OA I far known lower bound on the capacity pre-log of non-coherent MIMO flat-fading channels. We then 



extend our analysis to the multiple-access channel. 
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I. Introduction 

The capacity of coherent mukiple-input mukiple-output (MIMO) channels increases with the 
signal-to-noise ratio (SNR) as min(nt,nr) logSNR, where Ut and Uj- are the number of transmit 
and receive antennas, respectively, and SNR denotes the SNR per receive antenna [[T], 
The growth factor min(nt, Uj.) is sometimes referred to as the capacity pre-log [|3] or spatial 
multiplexing gain flU. This capacity growth can be achieved using a nearest neighbor decoder 
which selects the codeword that is closest (in a Euclidean distance sense) to the channel output. 
In fact, for coherent fading channels with additive Gaussian noise, this decoder is the maximum- 
likelihood decoder and is therefore optimal in the sense that it minimizes the error probability 
(see [5] and references therein). The coherent channel model assumes that there is a genie that 
provides the fading coefficients to the decoder; this assumption is difficult to achieve in practice. 
In this paper, we replace the role of the genie by a scheme that estimates the fading via pilot 
symbols. This can be viewed as a particular coding strategy over a non-coherent fading channel, 
i.e., a channel where both communication ends do not have access to fading coefficients but 
may be aware of the fading statistics. Note that with imperfect fading estimation, the nearest 
neighbor decoder that treats the fading estimate as if it were perfect is not necessarily optimal. 
Nevertheless, we show that, in some cases, nearest neighbor decoding with pilot-aided channel 
estimation achieves the capacity pre-log of non-coherent fading channels. (The capacity pre-log 
is defined as the limiting ratio of the capacity to the logarithm of the SNR as the SNR tends to 
infinity.) 

The capacity of non-coherent fading channels has been studied in a number of works. Building 
upon f6\, Hassibi and Hochwald [7 J studied the capacity of the block-fading channel and used 
pilot symbols (also known as training symbols) to obtain reasonably accurate fading estimates. 
Jindal and Lozano [SI provided tools for a unified treatment of pilot-based channel estimation 
in both block and stationary fading channels with bandlimited power spectral densities. In these 
works, lower bounds on the channel capacity were obtained. Lapidoth [Si studied a single-input 
single-output (SISO) fading channel for more general stationary fading processes and showed 
that, depending on the predictability of the fading process, the capacity growth in SNR can be, 
inter alia, logarithmic or double logarithmic. The extension of [j3j to multiple-input single-output 
(MISO) fading channels can be found in [9J. A lower bound on the capacity of stationary MIMO 



January 8, 2013 



DRAFT 



3 



fading channels was derived by Etkin and Tse in [[TOl . 

Lapidoth and Shamai [11] and Weingarten et. al. [fT2l studied non-coherent stationary fading 
channels from a mismatched-decoding perspective. In particular, they studied achievable rates 
with Gaussian codebooks and nearest neighbor decoding. In both works, it is assumed that there 
is a genie that provides imperfect estimates of the fading coefficients. 

In this work, we add the estimation of the fading coefficients to the analysis. In particular, we 
study a communication system where the transmitter emits pilot symbols at regular intervals, and 
where the receiver separately performs channel estimation and data detection. Specifically, based 
on the channel outputs corresponding to pilot transmissions, the channel estimator produces 
estimates of the fading for the remaining time instants using a linear minimum mean-square 
error (LMMSE) interpolator. Using these estimates, the data detector employs a nearest neighbor 
decoder that detects the transmitted message. We study the achievable rates of this communication 
scheme at high SNR. In particular, we study the pre-log for fading processes with bandlimited 
power spectral densities. (The pre-log is defined as the limiting ratio of the achievable rate to 
the logarithm of the SNR as the SNR tends to infinity.) 

For SISO fading channels, using some simplifying arguments, Lozano [13] and Jindal and 
Lozano [8] showed that this scheme achieves the capacity pre-log. In this paper, we prove this 
result without any simplifying assumptions and extend it to MIMO fading channels. We show 
that the maximum rate pre-log with nearest neighbor decoding and pilot-aided channel estimation 
is given by the capacity pre-log of the coherent fading channel min(nt, nj) times the fraction 
of time used for the transmission of data. Hence, the loss with respect to the coherent case is 
solely due to the transmission of pilots used to obtain accurate fading estimates. If the inverse 
of twice the bandwidth of the fading process is an integer, then for MISO channels, the above 
scheme achieves the capacity pre-log derived by Koch and Lapidoth |l9l. For MIMO channels, 
the above scheme achieves the best so far known lower bound on the capacity pre-log obtained 
in IIOl. 

The rest of the paper is organized as follows. Section |ll] describes the channel model and 
introduces our transmission scheme along with nearest neighbor decoding and pilots for channel 
estimation. Section UlI] defines the pre-log and presents the main result. Section |IV] extends the 
use of our scheme to a fading multiple-access channel (MAC). Sections |V] and |VI] provide the 
proofs of our main results. Section WQ\ summarizes the results and concludes the paper. 
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II. System Model and Transmission Scheme 

We consider a discrete-time MIMO flat-fading channel with Ut transmit antennas and 
receive antennas. Thus, the channel output at time instant A; e Z (where Z denotes the set of 
integers) is the complex- valued rir -dimensional random vector given by 

Yk^J^MkXk + Zk. (1) 
V rit 

Here Xk G C"* denotes the time-A; channel input vector (with C denoting the set of complex 
numbers), denotes the (rir x rit) -dimensional random fading matrix at time k, and denotes 
the rir-variate random additive noise vector at time k. 

The noise process {Z^, /c e Z} is a sequence of independent and identically distributed (i.i.d.) 
complex-Gaussian random vectors with zero mean and covariance matrix where is the 
rij- X Ur identity matrix. SNR denotes the average SNR for each received antenna. 

The fading process {Hfc, A; G Z} is stationary, ergodic and complex-Gaussian. We assume that 
the rir ■ Ut processes {Hk{r, t),k e Z}, r = 1, . . . , rir, t = 1, . . . , nt are independent and have 
the same law, with each process having zero mean, unit variance, and power spectral density 
/ij(A), — I < A < |. Thus, /ij(-) is a non-negative (measurable) function satisfying 

/■1/2 

E[Hk+^{r,t)Hl{r,t)]= / e''''"'^fH{X)dX, (2) 

J-l/2 

where (•)* denotes complex conjugation, and where i = \/^. We further assume that the 
power spectral density fni-) has bandwidth Xn < 1/2, i.e., /ij(A) = for |A| > Ad and 
/if (A) > otherwise. We finally assume that the fading process {Mk, A; € Z} and the noise 
process {Zk, k el.} are independent and that their joint law does not depend on {xk, k e Z}. 

The transmission involves both codewords and pilots. The former conveys the message to be 
transmitted, and the latter are used to facilitate the estimation of the fading coefficients at the 
receiver. We denote a codeword conveying a message m, m e M. (where M. — {l, . . . , [e^^\ } 
is the set of possible messages, and where [b\ denotes the largest integer smaller than or equal 
to b) at rate it! by the length-n sequence of input vectors ^i(m), . . . , x„(m). The codeword 
is selected from the codebook C, which is drawn i.i.d. from an nt-variate complex-Gaussian 
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Fig. 1: Structure of pilot and data transmission for rit = 2, L = 7 and T = 2. 



distribution with zero mean and identity covariance matrix such that 



where || ■ || denotes the Euclidean norm. 

To estimate the fading matrix, we transmit orthogonal pilot vectors. The pilot vector pt G C"* 
used to estimate the fading coefficients corresponding to the t-th transmit antenna is given by 
Pt{t) = 1 and pt{t') = for t' ^ t. For example, the first pilot vector is pi = (1,0, ■■ ■ , 0)^, 
where (•)^ denotes the transpose. To estimate the whole fading matrix, we thus need to send the 
rit pilot vectors pi,...,Pn,. 

The transmission scheme is as follows. Every L time instants (for some L G N, where N 
is the set of all positive integers), we transmit the rit pilot vectors Pi, . . . ,Pnf Each codeword 
is then split up into blocks of L — rit data vectors, which will be transmitted after the Ut pilot 
vectors. The process of transmitting L — n^ data vectors and pilot vectors continues until all n 
data vectors are completed. Herein we assume that n is an integer multiple of L — ritE Prior to 
transmitting the first data block, and after transmitting the last data block, we introduce a guard 
period of L(T — 1) time instants (for some T G N), where we transmit every L time instants the 
Ut pilot vectors Pi, . . . ,Pnt^ but we do not transmit data vectors in between. The guard period 
ensures that, at every time instant, we can employ a channel estimator that bases its estimation 
on the channel outputs corresponding to the T past and the T future pilot transmissions. This 
facilitates the analysis and does not incur any loss in terms of achievable rates. The above 
transmission scheme is illustrated in Fig. \T\ The channel estimator is described in the following. 

Note that the total block-length of the above transmission scheme (comprising data vectors, 

'if n is not an integer multiple of L — nt, then the last L — nt instants are not fully used by data vectors and contain therefore 
time instants where we do not transmit anything. The thereby incurred loss in information rate vanishes as n tends to infinity. 




(3) 



k=l 
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pilot vectors and guard period) is given by 

n' = Up + n + rig (4) 

where Up denotes the number of channel uses reserved for pilot vectors, and where rig denotes 
the number of channel uses during the silent guard period, i.e., 

np = + 1 + 2(r - 1)^ nt, (5) 

ng = 2(L-nt)(T-l). (6) 

We now turn to the decoder. Let V denote the set of time indices where data vectors of a 
codeword are transmitted, and let V denote the set of time indices where pilots are transmitted. 
The decoder consists of two parts: a channel estimator and a data detector. The channel estimator 
considers the channel output vectors Yk, k E V corresponding to the past and future T pilot 
transmissions and estimates Hk{r,t) using a linear interpolator, so the estimate H^\r,t) of the 
fading coefficient Hk{r,t) is given by 

k+TL 

Hi^\r,t)= Yl (^k'{r,t)Y,,{r) (7) 

k'=k-TL: 

where the coefficients a^/ (r, t) are chosen in order to minimize the mean-squared error.^ 

Note that, since the pilot vectors transmit only from one antenna, the fading coefficients 
corresponding to all transmit and receive antennas (r, t) can be observed. Further note that, 
since the fading processes {Hk{r,t), k E Z}, r = l,...,ni., t = l,...,nt are independent, 
estimating Hk{r,t) only based on {Ifc(r), k E X} rather than on {Yk, k E X} incurs no loss in 
optimality. 

Since the time-lags between M^, k E V and the observations Yy, k' E V depend on k, it 
follows that the interpolation error 

Ef\T,t)^Hk{r,t)-H^P{T,t) (8) 

is not stationary but cyclo-stationary with period L. It can be shown that, irrespective of r, the 

^It has been shown in 1141 that for the hnear interpolator in |(7)[ only the observations when pilots are transmitted, 1^/ , k' £ P 
are relevant for fading estimation. 
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variance of the interpolation error 



Hk{r,t)-H^P{r,t) 



(9) 



tends to the following expression as T tends to infinity [[13 



T— >oo ' 



1 - 



SNR|/. 



L/-t+l 



(A)P 



_i/2 SNR/i,o(A) + nt 



(10) 
(11) 



where £ = A; mod L denotes the remainder of k/ L. Here fL,e{') 1^ given by 

L-l 



X-u 



0, 



L- 1 



(12) 



and ///(■) is the periodic continuation of ///(■)' i-^-' It 1^ the periodic function of period [—1/2, 1/2) 
that coincides with fnW for -1/2 < A < 1/2. If 

1 



L < 



2A 



(13) 



D 



then l/i 



becomes 



|/w(A)|=^,o(A) = ^/^Q 



1 , 1 
- < A < -. 

2 - - 2 



(14) 



In this case, irrespective of i and t, the variance of the interpolation error is given by 

^1/2 SNR[/h(A)]^ 



eiit) 



6^ = 1 



-dX, 



0, 



L-l, t = 1, 



(15) 



.1/2 SNR/H(A) + Lnt 

which vanishes as the SNR tends to infinity. Recall that Ad denotes the bandwidth of fni')- 
Thus, [(T3)] implies that no aliasing occurs as we undersample the fading process L times. Note 
that in contrast to [(1 1)[ the variance in [(15)[ is independent of the transmit antenna index t. See 
Section IV-AI for a more detailed discussion. 

The channel estimator feeds the sequence of fading estimates {M^^\k G V} (which is 
composed of the matrix entries {H^\r,t),k G V}) to the data detector. We shall denote its 
realization by {H^ ,k E V}. Based on the channel outputs {yk,k G V} and fading estimates 
{H^^'', k G V}, the data detector uses a nearest neighbor decoder to guess which message was 
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transmitted. Thus, the decoder decides on the message m that satisfies 



m = arg min D{m) (16) 

meM 



where 

D{m)^ Yl yk-\I^K^'Mm) . (17) 



On the RHS of |(17)| , assuming that the first pilot symbol is transmitted at time /c = 0, we have 
defined 

= {0,...,n'- l}n I? (18) 
as a set of time indices for a single codeword transmission. 

III. ThePre-Log 

We say that a rate 

i?(SNR) A 1^ (19) 

n 

is achievable if there exists a code with [e"^J codewords such that the error probability tends 
to zero as the codeword length n tends to infinity. In this work, we study the set of rates that 
are achievable with nearest neighbor decoding and pilot-aided channel estimation. We focus on 
the achievable rates at high SNR. In particular, we are interested in the maximum achievable 
pre-log, defined as 

n^^^limsup (20) 
SNR^oo logSNR 

where _R*(SNR) is the maximum achievable rate, maximized over all possible encoders. 

The capacity pre-log — which is given by |(20)| but with i?*(SNR) replaced by the capacit}!^ 
C(SNR) — of SISO fading channels was computed by Lapidoth [3] as 

nc = /i({A: /i/(A) = 0}) (21) 

where denotes the Lebesgue measure on the interval [—1/2, 1/2]. Koch and Lapidoth |P| 
extended this result to MISO fading channels and showed that if the fading processes {Hk{t), k G 

^The capacity is defined as the supremum of all achievable rates maximized over all possible encoders and decoders. 
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Z}, t = 1, . . . ,nt are independent and have the same law, then the capacity pre-log of MISO 
fading channels is equal to the capacity pre-log of the SISO fading channel with fading process 
{Hk{l),k E Z}. Using \(Tl)\ the capacity pre-log of MISO fading channels with bandlimited 
power spectral densities of bandwidth A o can be evaluated as 

He = 1 - 2\n. (22) 

Since i?*(SNR) < C(SNR), it follows that Ur* < He. 

To the best of our knowledge, the capacity pre-log of MIMO fading channels is unknown. 
For independent fading processes {H^^r, t),k E Z}, t = 1, . . . ,nt, r = 1, . . . ,nj. that have the 
same law, the best so far known lower bound on the MIMO pre-log is due to Etkin and Tse 
IfTOl . and is given by 

He > min(nt,ni.)^l -min(nt,ni.)yu(^{A: /H(A)>0}j^. (23) 

For power spectral densities that are bandlimited to Ad, this becomes 

He > min(r?,t, rij-) ^1 — min(nt, rij.) 2Xo^ ■ (24) 

Observe that |(24)| specializes to |(22)| for rir = 1. It should be noted that the capacity pre-log 
for MISO and SISO fading channels was derived under a peak-power constraint on the channel 
inputs, whereas the lower bound on the capacity pre-log for MIMO fading channels was derived 
under an average-power constraint. Clearly, the capacity pre-log corresponding to a peak-power 
constraint can never be larger than the capacity pre-log corresponding to an average-power 
constraint. It is believed that the two pre-logs are in fact identical (see the conclusion in 

In this paper, we show that a communication scheme that employs nearest neighbor decoding 
and pilot-aided channel estimation achieves the following pre-log. 

Theorem 1. Consider the Gaussian MIMO flat-fading channel with rit transmit antennas and 
receive antennas |(1)| Then, the transmission and decoding scheme described in Section |3 



achieves 



where L* = ttt- 

ZAf 



n^. > min rit, n,) { 1 (25) 
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Proof: See Section |Vl ■ 
Remark 1. We derive Theorem \l\for i.i.d. Gaussian codebooks, which satisfy the average- 



power constraint |(3)| Nevertheless, it can be shown that Theorem [7] continues to hold when the 



channel inputs satisfy a peak-power constraint. More specifically, we show in Section IV-CI that a 



sufficient condition on the input distribution with power constraint E 
the pre-log is that its probability density function (p.d.f.) Px{^) satisfies 



< Tit for achieving 



for some K satisfying 



Px[x) < — e-""" , xe<r^ (26) 



\ogK 

lim 1-4^ = 0. (27) 
SNR^oo logSNR 



The condition |(26)| is satisfied, for example, by truncated Gaussian inputs, for which the rit 



elements in X are independent and identically distributed and 

p^(x) = -r^e"'*'', X G {x G C"' : |x(t)| < 1, 1 < t < nt} (28) 



1 \ '^t 



with 

K = ( [ -e-l*l^c/x ) . (29) 
V7|5;|<1 / 

If 1/(2 A/)) is an integer, then |(25)] becomes 

Un* > min(nt, n^) ^1 - min(nt, Uj.) 2\d^ ■ (30) 

Thus, in this case nearest neighbor decoding together with pilot-aided channel estimation achieves 
the capacity pre-log of MISO fading channels |(22)| as well as the lower bound on the capacity 
pre-log of MIMO fading channels |(24)[ 

Suppose that both the transmitter and the receiver use the same number of antennas, namely 
Ut' = W = min(nt, rir). Then, as the codeword length tends to infinity, we have from |(4)| - [(6) 
that the fraction of time consumed for the transmission of pilots is given by 



n^ocn' n^oo j^_^^^^2(T-l)jnt' + n + 2(L-nt')(T-l) ^ 
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Consequently, from the achievable pre-log |(25)| namely 

n«.>„.'(l-<), L<J-, (32) 

we observe that the loss compared to the capacity pre-log of the coherent fading channel rit' = 
mm{nt,nj.) is given by the fraction of time used for the transmission of pilots. From this we 
infer that the nearest neighbor decoder in combination with the channel estimator described in 
Section HI] is optimal at high SNR in the sense that it achieves the capacity pre-log of the coherent 
fading channel. This further implies that the achievable pre-log in Theorem [T] is the best pre-log 
that can be achieved by any scheme employing Ut' pilot vectors. 

To achieve the pre-log in Theorem [T] we assume that the training period L satisfies L < 
in which case the variance of the interpolation error |(15)| namely 

•1/2 



e 



J_y,SmfH{X) + Ln,^ SNR ' ^''^ 

vanishes as the inverse of the SNR. The achievable pre-log is then maximized by maximizing 
L < Note that as a criterion of "perfect side information" for nearest neighbor decoding in 
fading channels, Lapidoth and Shamai ifTTIl suggested that the variance of the fading estimation 
error should be negligible compared to the reciprocal of the SNR. Using the linear interpolator 



(7)1 we obtain an estimation error with variance decaying as the reciprocal of the SNR provided 
that L < TTT— • Thus, the condition L < ttt— can be viewed as a sufficient condition for obtaining 
"nearly perfect side information" in the sense that the variance of the interpolation error is of 
the same order as the reciprocal of the SNR. 

Of course, one could increase the training period L beyond Indeed, by increasing L, we 
could reduce the rate loss due to the transmission of pilots as indicated in |(32)| at the cost of 
obtaining a larger fading estimation error, which in turn may reduce the reliability of the nearest 
neighbor decoder. To understand this trade-off better, we shall analyze the achievable pre-log 
when L > Note that for L > the variance of the interpolation error follows from Kl 1) 



J_i/2 SNR/l,o(A) +nt 



1/2 SNR/i,o(A) + rit 7_i/2 SNR/z.,o(A) + Ut 
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The former integral 

-dX^^ (36) 



.i/2SNR/i,o(A)+nt SNR 

vanishes as the SNR tends to infinity. However, we prove in Appendix |B] that as the SNR tends 
to infinity, the latter integral 

V2 SNR([/,,o(A)]^-|/,,,_,+i(A)|^) 

SNR^,o(A)+nt ^ ^ 

is bounded away from zero. This implies that the interpolation error |(35)| does not vanish as 
the SNR tends to infinity, and the pre-log achievable with the scheme described in Section HI] 
is zero. It thus follows that the condition L < is necessary in order to achieve a positive 
pre-log. 

Comparing |(25)| and |(24)| with the capacity pre-log min(?T,t, rir) for coherent fading channels 
ID, lEl, we observe that, for a fading process of bandwidth Ad, the penalty for not knowing 
the fading coefficients is roughly (min(nt, n^)y ■ 2Xd- Consequently, the lower bound [(25)] does 
not grow linearly with min(?T,t, rij.), but it is a quadratic function of min(?T,t, rir) that achieves its 
maximum at 

L* 

mm{nt,nr) = — . (38) 



This gives rise to the lower bound 



n^* > Y (39) 



which cannot be larger than 1/(8X0). The same holds for the lower bound ](23)] 

IV. Fading Multiple-Access Channels 

In this section, we extend the use of nearest neighbor decoding with pilot-aided channel 
estimation to the fading MAC. We are interested in the achievable pre-log region that can be 
achieved with this scheme. 

We consider a two-user MIMO fading MAC, where two terminals wish to communicate with 
a third one, and where the channels between the terminals are MIMO fading channels. Extension 
to more than two users is straightforward. The first user has r2t,i antennas, the second user has 
nt,2 antennas and the receiver has antennas. The channel model is depicted in Fig. [2l The 
channel output at time instant A; G Z is a complex-valued rir-dimensional random vector given 
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Fig. 2: The two-user MIMO fading MAC diagram. 

by 

Yk = Vsmui^kXi^k + Vsmii2,kX2,k + z^. m 

Here cc^ ^ G C"'-'' denotes the time-fc channel input vector corresponding to User s, s = 1, 2; Hs ^ 
denotes the {n^ x nt,s) -dimensional fading matrix at time k corresponding to User s, s = 1, 2; 
SNR denotes the average SNR for each transmit antenna; and denotes the nr-variate additive 
noise vector at time k. The fading processes {Hs fc, ^ G Z}, s = 1,2 are independent of each 
other and of the noise process {Zk,k E Z}, and follow the same setup as the one used in the 
point-to-point channel (Section 

Both users transmit codewords and pilot symbols over the channel |(40)[ To transmit the 
message G {1,..., [e"^"]}, s = 1,2, (where mi and 1712 are drawn independently) each 
user's encoder selects a codeword of length n from a codebook Cg, where Cg, s = 1, 2 are drawn 
i.i.d. from an rit s-variate, zero-mean, complex-Gaussian distribution of covariance matrix l^t^. 
Similar to the single-user case, orthogonal pilot vectors are used. The pilot vector ps < G C"*'% 
s = 1, 2, if: = 1, . . . , rit^s used to estimate the fading coefficients from transmit antenna t of User 
s is given by Ps,t{t) = 1 and Ps,t{t') = for t' 7^ t. For example, the first pilot vector of User 
s is given by (1,0,..., 0)^. To estimate the fading matrices Hi^^ and 112,^, each training period 
requires transmission of (nt,i + nt,2) pilot vectors pi,i, . . . , pi^„^^, p2,i, • • • ,P2,nt,2- 

Assuming transmission from both users is synchronized, the transmission scheme extends the 
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point-to-point setup in Section |Il] to the two-user MAC setup as illustrated in Fig. |3] Every L 
time instants (for some L > r2t,i + nt^2, L E N), User 1 first transmits the r2t,i pilot vectors 
Pi 1, . . . ,Pi,„t 1- Once the transmission of the r2t,i pilot vectors ends, User 2 transmits its nt^2 
pilot vectors p2,i5 • • • > P2,nt 2- The codewords for both users are then split up into blocks of 
{L — rit^i — nt,2) data vectors, which are transmitted simultaneously after the (rit,! + ^1,2) pilot 
vectors. The process of transmitting (L — rit^i — nt,2) data vectors and (nt_i + nt^2) pilot vectors 
continues until all n data symbols are completed. Herein we assume that n is an integer multiple 
of {L — rit,! — ?^t,2)0 Prior to transmitting the first data block, and after transmitting the last 
data block, a guard period of L{T — 1) time instants (for some T G N) is introduced for the 
purpose of channel estimation, where we transmit every L time instants the (nt,i + nt,2) pilot 
vectors but we do not transmit data vectors in between. Note that codewords from both users are 
jointly transmitted at the same time instants whereas pilots from both users do not interfere and 
are separately transmitted at different time instants. The total block-length of this transmission 
scheme (comprising data vectors, pilot vectors and guard period) is given by 

n' = rip + n + rig (41) 

where and are 

rip=(- + 1 + 2(T-1) ) (nt,i+nt,2), (42) 

ng = 2(L-nt,i-nt,2)(T-l). (43) 

Similarly to the single-user case, the receiver guesses which messages have been transmitted 
using a two-part decoder that consists a channel estimator and a data detector. The channel 
estimator first obtains matrix- valued fading estimates {H^ k,k E V}, s = 1,2 from the received 
pilots Yfc/, k' E V using the same linear interpolator as |(7)| From the received codeword {yk, k E 
V} and the channel-estimate matrices {H^^\fc E V}, s = 1,2 (which are the realizations of 
[E.], ,!,k E V}, s = 1, 2), the decoder chooses the pair of messages (mi, 7712) that minimizes the 
distance metric 

(mi,m2) = arg min 0(1711, 1712) (44) 

(mi,m2) 

''As in the point-to-point setup, tliis assumption is not critical in terms of rate, cf. Footnote 1 on page 5. 
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H Pilot Q Data Q No transmission 

< > 

i = i ■ ■ ■m . . . I I ■ ■ Mnn 

t = 2 I ■ ■ ■~n . . . iiiw ■iiiir 



s = 2, t = i \ \ m ■ ■ . . . I I I I ■ ■Ill i~rM 

Fig. 3: Structure of joint-transmission scheme, nt,i = 2, nt,2 = 1, L = 7 and T = 2. 

I Pilot Q Data Q No transmission 







« = 1 ■ 1 1 ■ II ■ 1 1 1 1 1 ■ 1 1 ■ 









t = 2 FH 1 1 ■ II ■ 1 1 ■ 1 1 ■ 1 1 ■ 





L(r-l) L L{T-l) 


L(T-l) i(T-l) 


^ = 2, t = l 


■IIHIIHIII...HIIH 




< ■> 

L 




(1 - /3)n' 



Fig. 4: Structure of TDMA scheme, nt,i = 2, nt,2 = 1, L = 4 and T = 2. 



where 



D(mi,m2)^ 5^ 111/,.- v^Hi5^i,fcK)- v^H^5^2,fe(m2) 



(45) 



and where is defined in the same way as |(18)[ In the following, we will refer to the above 
communication scheme as the joint-transmission scheme. 

We shall compare the joint-transmission scheme with a time-division multiple-access (TDMA) 
scheme, where each user transmits its message using the transmission scheme illustrated in 
Fig. m Specifically, during the first f3n' channel uses (for some < /3 < 1), User 1 transmits 
its codeword according to the transmission scheme given in Section |II] (see also Fig. Hj), while 
User 2 is silent. (Here n' is given in |(41)[ ) Then, during the next (1 — f3)n' channel uses. User 
2 transmits its codeword according to the same transmission scheme, while User 1 is silent. In 
both cases, the receiver guesses the corresponding message m^, s = 1, 2 using a nearest neighbor 
decoder and pilot-aided channel estimation. 
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A. The MAC Pre-Log 

Let i?i(SNR), i?^(SNR) and i?]'^2(SNR) be the maximum achievable rate for User 1, the 
maximum achievable rate for User 2 and the maximum achievable sum-rate, respectively. The 
achievable-rate region is given by the closure of the convex hull of the set ifTSll 



7^ = |i?i(SNR),i?2(SNR): i?i(SNR) < i?t(SNR), 

i?2(SNR) < i?;(SNR), 

i?i(SNR) + i?2(SNR) < i?*+2(SNR)|. (46) 

We are interested in the pre-logs of -Ri(SNR) and i?2(SNR), defined as the limiting ratios of 
i?i(SNR) and i?2(SNR) to the logarithm of the SNR as the SNR tends to infinity. Thus, the 
pre-log region is given by the closure of the convex hull of the set 



= S^Ur^Ur^: Ur, < Ur^, 



where 



^R2 < n/?*, 

n^, + n^, < n^*^, } (47) 



n.j^limsupfi^|^, (48) 
' SNR^oo log SNR 



^ ^ , m(sm) 

' SNR^oo log SNR 



Ur* ^ limsup "r^r;,^" . (50) 



i?t+2(SNR) 



'•1+2 



SNR-s>oo log SNR 



The capacity pre-logs Hc^, Ilc2 and n^^^^ defined in the same way but with _R^(SNR), 
/?^(SNR) and i?t+2(SNR) replaced by the respective capacities Ci(SNR), C2(SNR) and 
Ci+2(SNR). 

We next present our result on the pre-log region of the two-user MIMO fading MAC achievable 
with the joint-transmission scheme. 

Theorem 2. Consider the MIMO fading MAC model |(40)[ Then, the pre-log region achievable 
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with the joint-transmission scheme is the closure of the convex hull of the set 

L* 



Hr^.TIr^ : Hr^ < mill (n^, nt,i) 1 



Ur^ < mill (nr,nt,2) ( 1 



IIr, +11/22 < mill (rar,nt,i +nt,2) ( 1 - ) ) (51) 



where L* 



2Xd 



Proof: See Section |Vll ■ 
The pre-log region given in Theorem |2]is the largest region achievable with any transmission 
scheme that uses (rit i + 121^2) /L* of the time for transmitting pilot symbols. Indeed, even if the 
channel estimator would be able to estimate the fading coefficients perfectly, and even if we 
could decode the data symbols using a maximum-likelihood decoder, the capacity pre-log region 
(without pilot transmission) would be given by the closure of the convex hull of the set El, [El, 

(11^^,11^2): Ur^ < min(nr,nt,i) 
Ur^ < min(rar,nt,2) 

Ur^ + Ur^ < min(ni., nt,i + nt,2) \ (52) 



which, after multiplying by 1 — (r;,t,i +nt,2) / L* in order to account for the pilot symbols, becomes 
1(5 1)1 Thus, in order to improve upon |(51)| one would need to design a transmission scheme that 
employs less than (nt 1 + 711^2)/ L* pilot symbols per channel use. 

Remark 2 (TDMA Pre-Log). Consider the MIMO fading MAC model |(40)| Then, the pre-log 
region achievable with the TDMA scheme employing nearest neighbor decoding and pilot-aided 
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channel estimation is the closure of the convex hull of the set 
jn/?!, n^jj : < /3 mm (n^, nt,i) 



Hr, < (1 - /3) mill K, nt,2) (l - ^) , < /3 < 1 1 



(53) 



where L* 



. This follows directly from the pre-log of the point-to-point MIMO fading 



_2Xd_ 

channel (Theorem [7]) where the number of transmit antennas from Users 1 and 2 is given by 
nt,i and nt_2, respectively. 

Note that the sum of the pre-logs Hr-^ + Hr^ is upper-bounded by the capacity pre-log of the 
point-to-point MIMO fading channel with (^t, 1+^1,2) transmit antennas and receive antennas, 
since the point-to-point MIMO channel allows for cooperation between the transmitting terminals. 
While the capacity pre-log of point-to-point MIMO fading channels remains an open problem, 
the capacity pre-log of point-to-point MISO fading channels is known, cf. |(22)[ It thus follows 
from 1(22)] that, for n-^ = nt,i = = 1> we have 

Hk, + n^, < nc,+, = 1-2\d (54) 

which together with the single-user constraints 

n^j, <Iic, = l- 2\n (55) 
Hr, <Uc, = l- 2Xd (56) 

implies that TDMA achieves the capacity pre-log region of the SISO fading MAC. The next 
section provides a more detailed comparison between the joint-transmission scheme and TDMA. 

B. Joint Transmission versus TDMA 

In this section, we discuss how the joint-transmission scheme performs compared to TDMA. To 
this end, we compare the sum-rate pre-log ^r*^,^ of the joint-transmission scheme (Theorem |2]) 
with the sum-rate pre-log of the TDMA scheme employing nearest neighbor decoding and 
pilot-aided channel estimation (Remark |2l) as well as with the sum-rate pre-log of the coherent 
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TDMA scheme, where the receiver has knowledge of the realizations of the fading processes 
{Hs fc, /c e Z}, s = 1, 2. In the latter case, the sum-rate pre-log is given by 

^RU2 = /3min(nr,nt,i) + (1 - (3) min{n,,nt,2)- (57) 

The following corollary presents a sufficient condition on L* under which the sum-rate pre-log 
of the joint-transmission scheme is strictly larger than that of the coherent TDMA scheme [(57)[ 
as well as a sufficient condition on L* under which the sum-rate pre-log of the joint-transmission 
scheme is strictly smaller than the sum-rate pre-log of the TDMA scheme given in Remark [2l 
Since |(57)| is an upper bound on the sum-rate pre-log of any TDMA scheme over the MIMO 
fading MAC |(40)[ and since the sum-rate pre-log given in Remark |2] is a lower bound on the 
sum-rate pre-log of the best TDMA scheme, it follows that the sufficient conditions presented 
in Corollary \T\ hold also for the best TDMA scheme. 

Corollary 1. Consider the MIMO fading MAC model |(40)[ The joint-transmission scheme 
achieves a larger sum-rate pre-log than any TDMA scheme if 

^* ^ min(nr,nt,i + nt,2)(nt,i + nt^a) ^^^^ 

min(nr, nt,i + nt,2) - min(nr, max(nt,i, rit^a)) 

where we define a/0 = oo for every a > 0. Conversely, the best TDMA scheme achieves a larger 
sum-rate pre-log than the joint-transmission scheme if 

^ m\n{n,,nt^i + nt,2)(nt,i + nt,2) 
min(nr, nt,i + nt,2) - min(nr, nt,i, nt,2) 

min(nt,inr, nt,i^, nt,2y^r, nt,2^) ^^^^ 

min(nr, nt,i + nt,2) - min(nr, nt,i, nt,2) ' 

Recall that L* is inversely proportional to the bandwidth of the power spectral density fni-), 
which in turn is inversely proportional to the coherence time of the fading channel. Corollary [1] 
thus demonstrates that the joint-transmission scheme tends to be superior to TDMA when the 
coherence time of the channel is large. In contrast, TDMA is superior to the joint-transmission 
scheme when the coherence time of the channel is small. Intuitively, this can be explained by 
observing that, compared to TDMA, the joint-transmission scheme uses the multiple antennas at 
the transmitters and at the receiver more efficiently, but requires more pilot symbols to estimate 
the fading coefficients. Thus, when the coherence time is large, the number of pilot symbols 
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required to estimate the fading is small, so the gain in achievable rate by using the antennas 
more efficiently dominates the loss incurred by requiring more pilot symbols. On the other hand, 
when the coherence time is small, the number of pilot symbols required to estimate the fading 
is large and the loss in achievable rate incurred by requiring more pilot symbols dominates the 
gain by using the antennas more efficiently. 

We next evaluate [(58)| and |(59)| for some particular values of n-c, nt,i, and ?2t,2- 

1) Receiver employs less antennas than transmitters: Suppose that < min(nt,i, nt,2)- Then, 
the right-hand sides (RHSs) of [(58)] and [(59)] become oo, so every finite L* satisfies [(59)[ Thus, if 
the number of receive antennas is smaller than the number of transmit antennas, then, irrespective 
of L* , TDMA is superior to the joint-transmission scheme. 

2) Receiver employs more antennas than transmitters: Suppose that n^. > nt i + nt 2, and 
suppose that nt,i = nt,2 = ^t- Then, [(58)[ and [(59)[ become 

L* > Ant (60) 

and 

L* < 3nt. (61) 

Thus, if L* is greater than Arit, then the joint-transmission scheme is superior to TDMA. In 
contrast, if L* is smaller than 3nt, then TDMA is superior. This is illustrated in Fig. [5] for the 
case where = 2 and rit 1 = ?7,t,2 = 1- Note that if L* is between Srit and An^, then the 
joint-transmission scheme is superior to the TDMA scheme presented in Remark [2l but it may 
be inferior to the best TDMA scheme. 

3) A case in between: Suppose that n-^ < nt,i + nt,2 and nt,2 < ^^^r < ^t,i- Then, [(58)] becomes 

L* > 00 (62) 

and [(59)[ becomes 

L < nt,2 H • (63) 

Thus, in this case the joint- transmission scheme is always inferior to the coherent TDMA scheme 
[(57)[ but it can be superior to the TDMA scheme in Remark [2] 
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Coherent TDMA 

Noncoherent TDMA 

Noncoherent Joint-transmission 



(a) L* < 3 (b) L* >4 

Fig. 5: Pre-log regions for a fading MAC with Uj. = 2 and nt,i = nt,2 = 1 for different values 
of L*. Depicted are the pre-log region for the joint-transmission scheme as given in Theorem |2] 
(dashed line), the pre-log region of the TDMA scheme as given in Remark [2] (solid line), and 
the pre-log region of the coherent TDMA scheme [(57) | (dotted line). 



C. Typical Values of L* 

We briefly discuss the range of values of L* that may occur in practical scenarios. To this end, 
we first recall that L* < [1/(2A£))J, and that A/) is the bandwidth of the fading power spectral 
density fni'), which can be associated with the Doppler spread of the channel as ifTOl 



Here is the maximum Doppler shift given by 

U = -fc (65) 

c 

where v is the speed of the mobile device, c = 3 ■ 10*^ m/s is the speed of light and fc is the 
carrier frequency; and Wc is the coherence bandwidth of the channel approximated as [[TOl . lfT6l 

Wc^^ (66) 
bar 

where is the delay spread. Following the order of magnitude computations of Etkin and Tse 
ifTOl . we determine typical values of Xo for indoor, urban, and hilly area environments and for 
carrier frequencies ranging from 800 MHz to 5 GHz and tabulate the results in Table IB 

For indoor environments and mobile speeds of 5 km/h, we have that L* is typically larger 
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Environment Delay spread Cr Mobile speed v \d ~ 5ar^fc L* 

Indoor 10 - 100 ns 5 km/h 2 • IQ-^ - IQ-^ 5 • 10^ - 2.5 • 10^ 

Urban 1 - 2 /is 5 km/h 2 • IQ-^ - 2 • 10^^ 2.5 • 10^ - 2.5 • lO'^ 

Urban 1 - 2 ^s 75 km/h 2 • lO"* - 0.004 125 - 2.5 • 10^ 

Hilly area 3 - 10 /is 200 km/h 0.002 - 0.05 10 - 250 



TABLE I: Typical values of L* for various environments with ranging from 800 MHz to 5 
GHz. The values of the delay spread are taken from [fTOl . [[T6l for indoor and urban environments 
and from [fTTl for hilly area environments. 



than 5 ■ 10''. For urban environments, L* is typically larger than 2.5 ■ 10^ for mobile speeds of 5 
km/h and larger than 125 for mobile speeds of 75 km/h. For hilly area environments and mobile 
speeds of 200 km/h, L* ranges typically from 10 to 250. Thus, for most practical scenarios, L* is 
typically large. It therefore follows that, if nr > rit^i +nt,2, the condition |(58)| is satisfied unless 
^t.i +^t.2 is very large. For example, if the receiver employs more antennas than the transmitters, 
and if n^^i = n^^2 = then L* > An^ is satisfied even for urban environments and mobile speeds 
of 75 km/h, as long as rit < 30. Only for hilly area environments and mobile speeds of 200 
km/h, this condition may not be satisfied for a practical number of transmit antennas. Thus, if 
the number of antennas at the receiver is sufficiently large, then the joint-transmission scheme is 
superior to TDMA in most practical scenarios. On the other hand, if rii < min(nt,i, nt,2), then 
TDMA is always superior to the joint-transmission scheme, irrespective of how large L* is. This 
suggests that one should use more antennas at the receiver than at the transmitters. 

V. Proof OF Theorem [H 
Theorem [T] is proven as follows. We first characterize the estimation error from the linear 



interpolator 1(7) I We then compute the rates achievable with the communication scheme described 



in Section ini Finally, we analyze the pre-log corresponding to these rates. 

A. Linear Interpolator 

We first note that the estimate of Hk{r,t) is given by |(7)[ namely, 

k+TL 

H!^\r,t)= Yl ak'{r,t)Ykir), keV. (67) 

k'=k-TL: 
k'eV 
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We denote the interpolation error by E^\r,t) = Hk{r,t) — Hj^\r,t). 

For future reference, and for any k E'Z., we express k = jL + i, so i = k mod L. Assuming 
that the first pilot symbol is transmitted at A; = 0, it follows that i = 0, . . . ,nt — 1 for k E V 
and i = rit, . . . , L — 1 for k E V. The statistical properties of the channel estimator for a given 
window size T are summarized in the following lemma. 

Lemma 1. For a given T, the linear interpolator |(67)| has the following properties. 

1) For each t = 1, . . . ,nt, r = 1, . . . , nr and i = rit, . . . , L — 1, the estimate H^j^^^f^^r, t) 

rrp\ 

and the corresponding estimation error Ejj^^^{r,t) are independent zero-mean complex- 
Gaussian random variables. 

2) a) For a given transmit antenna t and i E {rit, . . . , L — 1}, the n,- processes 

{iH^lUl,t),Ef^l,il,t)), J EX},..., {{Hfj^\,{n„t),Ef^^,{n,,t)), j E Z} 

are independent and have the same law. 
b) For a given receive antenna r and i E {rit, . . . , L — 1}, the rit processes 

{{H^IU^, 1), Ef^Ur, 1)), J EX},..., {{H%\,{r, rit), E^^^r, rit)), j G Z} 

are independent but have different laws. 

3) For each £ = rit, . . . , L — 1, the process {{M^j^+i^ ^jL+e, Zji^g, Xj^^e), j E Z} is jointly 
stationary and ergodic. 

4) For i = rit, . . . , L — 1, it holds that 



(68) 



where (■)^ denotes the conjugate transpose. 
Proof: See Appendix El ■ 

B. Achievable Rates and Pre-Logs 

In the following proof, we only consider the case where rit = ri^- The more general case of 
rit 7^ follows then by employing only transmit antennas or by ignoring — rit antennas 
at the receiver. This yields a lower bound on the maximum achievable rate and does not incur 
a loss with respect to the pre-log. Indeed, it can be shown that the nearest neighbor decoder 
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described in Section |ll] achieves the pre-log min(nr,nt). Thus, increasing rit beyond Uj. or n,. 
beyond rit does not improve the pre-log achievable by such a decoder. In fact, increasing Ut 
beyond requires the transmission of more pilot symbols and does therefore even reduce the 
pre-log achievable with the communication system described in Section |lll 

To prove Theorem [T] we analyze the generalized mutual information (GMI) [18] for the 
channel and communication scheme in SectionHIl The GMI, denoted by /|,™'(SNR), specifies the 
highest information rate for which the average probability of error, averaged over the ensemble 
of i.i.d. Gaussian codebooks, tends to zero as the codeword length n tends to infinity (see [l5l, 
ttm, [[T2I and references therein). The GMI for stationary Gaussian fading channels employing 
nearest neighbor decoding has been evaluated in [fTT|. [fT2| for the case where a genie provides 
the receiver with an estimate of the fading process. However, the estimate considered in [11], 
[[T2| is assumed to be jointly stationary ergodic with {(Mk, Xk, Zk), k E Z}, which is not 
satisfied by {H^^'', k G V}. We therefore need to adapt the work in [fTTI. [fT2l| to our channel 
model. For completeness, we present all the main steps here, even though they are very similar 
to the ones in [fTT|. [[T2|. 

We prove Theorem [T] as follows: 

1) We compute a lower bound on /|.™'(SNR) for a fixed window size T. 

2) We analyze the behavior of this lower bound as T tends to infinity. 

3) We evaluate the limiting ratio of this lower bound to logSNR as SNR tends to infinity. 

1) I^\SHR) for a fixed T : We analyze the GMI for a fixed T using a random coding 
upper bound on the average error probability. Note that due to the symmetry of the codebook 
construction, it suffices to consider the error behavior, conditioned on the event that message 1 
was transmitted. Let £{m') denote the event that D{m') < D{1). The ensemble-average error 
probability, where the average is over the ensemble of i.i.d. Gaussian codes, corresponding to 
message m = 1 is thus given by 



To evaluate the GMI from the RHS of |(69)[ we define some useful quantities in the following. 
Recall the channel and the transmission model in Section Ull Without loss of generality, assume 




(69) 
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that the first pilot vector is transmitted at time k = 0. Define F(SNR) as 



F(SNR) ^ n. 



SNR 



(L - nt)nt 



(L-l) 

EE 

e=nt 



E 



(T) 



(70) 



where E^^-* is a random matrix with element at row r and column t given by E^^^ (r, t) , and 
where || ■ ||f denotes the Frobenius norm. Further define a typical set 



Ts 



Xk,yk,\^^P) ,k = 0,...,n' -1 



n ^ 



Vk 



SNR 





2 






-F(SNR) 


-1 



(71) 



with = {0, . . . , n' — 1} n "D as provided in |(18)| and some 5 > 0, where we have recalled 
n' in 1(4) [ namely 

= Tip + n + ng. (72) 
Then, we have the following convergence as n tends to infinity. 
Lemma 2. For the communication scheme described in Section |S we have that 

lim Pr I (^X"', r"', H(^)'"') g = 1, V(5 > (73) 

where we have used the notation to denote the sequence Uq, . . . , ?7„'_i. 
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Proof: We have 
lim — 

lim — 



L-l 



SNR / p^(T) > , ^ 



Elim 
n.— s-oo 



L-rit 



£=nt i=0 



SNR 



H 



L-l 



e=nt 
F(SNR). 



almost surely 



(74) 

(75) 

(76) 

(77) 
(78) 



Herein [(76)] follows from Part 3) of Lemmadland the ergodic theorem [[T9l Chap. 7]; |(77)| follows 
from Part 4) of Lemma [H and |(7 8) [ follows since Xi has zero mean and covariance matrix 
and is independent from (since {E), , k E V} is a function of {{Mk, Z^), k G Z}). It thus 
follows that 

lim — 



Vk 



5^ 

rit 



(79) 



converges to F(SNR) almost surely, which in turn implies that it also converges in probability, 
thus proving [(73)[ ■ 
Considering the typical set [(71)[ and following the derivation in [[TTI . [[T2l . -Pe(l) in [(69)[ can 
be upper-bounded as 



Pe(l) <e"^ ■ Pr <^ - ■ D{m') < F(SNR) + 6 



n 



(x"'(l),r"',H(^)'"') G Taj 



+ Pr I (X"'(l), G r/} , m' ^ 1 



(80) 



where T^*^ denotes the complement of T^. It follows from Lemma [2] that the second term on the 
RHS of [(80)[ can be made arbitrarily small by letting n tend to infinity. 
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The GMI characterizes the rate of exponential decay of the expression 



Pr <^ - ■ D{m') < F(SNR) +5 
n 



(81) 



as n — )■ cxD [[TTll . [fT2ll . The computation of the GMI requires the conditional log moment- 
generating function of the metric D{m') associated with the wrong message output m' ^ 1 — 
conditioned on the channel outputs and on the fading estimates — which we shall denote by 
«:„(^,t/"',H(^)'"'),i.e., 



(82) 



Here we define 



Dkirri 



l\ A 



Vk 



SNR 



(83) 



Proceeding along the lines of [fTTI . [fT2l . we can express the conditional log moment-generating 
function in |(82)| as the sum of conditional log moment-generating functions for the individual 
vector metrics Dk{m'), k G 'D^^'\ i.e., 



^ logE 



{^-Dk{m')^ 


Vk-, 







Q ^ ( . 9 SNR.^ (r)r,t(T) 



- logdet - 



9 SNR.^ (T)r,t(T)' 



(84) 
• (85) 
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We then have that for all ^ < 



lim fn^,?/"',^^^)'"' 

lim i V logdetfu-^^Hf H 



{T)ut(T) 
k 



(86) 



L-l 



--1 



i=nt 



j=0 

L — rit ^ n^oo n ^ V nt -J^+M 

^=71t j=0 



(87) 



1 



L - fit 



i=nt 



SNR 



-1 



1 



i=nt 



almost surely (88) 



= fi:(^,SNR) 



where the last step should be regarded as the definition of k(6',SNR). The convergence in |(88)| 
is due to the ergodicity of {{YjL+e, M^j^+i)^ J ^ ^ = nt, . . . , L — 1 (see Part 3) of Lemma 
[T]) and the ergodic theorem. 

Following the same steps as in ifTTIl . [fT2ll . we can then show that for all 6' > 0, the ensemble- 
average error probability can be bounded as 



Pe(l) < exp{nR)exp (^-n (^/f'^XSNR) - 5')) + e{6',n) 



for some £{6',n) satisfying 



lim e(5',n) = 0, 6' > 0. 



(89) 



(90) 



On the RHS of [(89)1 ^r^SNR) denotes the GMI as a function of SNR for a fixed T, which is 
given by 

II^\SMR) = ^^^^ fsup (0F(SNR) - k{9, SHR))] . (91) 

\e<o J 
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Herein the pre-factor (L — n^jL equals the fraction of time used for data transmission. The 
bound 1(89)1 implies that for rates below J|.™'(SNR), the communication scheme described in 
Section |ll] has vanishing error probability as n tends to infinity. Combining |(70)| and |(88)| with 
[(9T)] yields 



L-l 



0<O 



L 



e=nt 



f SNR^ 

n, + E 




Ef> 


2 " 




log det ^ 








F 







nt 



(92) 



Following the steps used in [|20l App. D], it can be shown that for ^ < 

-1 



- E 



> 0. 



(93) 



As observed in [|20. App. D], a good lower bound on /|.™'(SNR) for high SNR follows by 
choosing 



e 



-1 



n, + SNRrire^, 



where 



e*,T= max e^r(r,t). 

r=l,...,nr, 
i=l,...,nt, 
£=nt,.--,i-l 



(94) 



(95) 



Hence, substituting the choice of 9 in |(94)| and applying [(93)| to the RHS of |(92)| yields 



/r(SNR)>lxi|E 



£=nt 



log det In. + 



SNR 



ntnr + ntnrSNRe^ 



- 1 



(96) 



2j Achievable Rates as T ^ oo : We next analyze the RHS of |(96)| in the limit as T tends to 
infinity. To this end, we note that, for L < the variance of the interpolation error tends to 
\(15)\ namely 

_ _ SNR[/^(A)]^ 
'^^'^"^ y_,/,SNR/^(A) + Ln/^ ^^^^ 



January 8, 2013 



DRAFT 



30 



irrespective of £ and t. We shall therefore denote the variance of the interpolation error ej{t) by 
e^. Note that for a fixed T, the entries of 

1 



-M 



(T) 



(98) 



are independent but not i.i.d., which follows from Part 2) of Lemma [H However, as T tends to 
infinity, their distribution becomes identical due to |(97)| and hence they converge in distribution 
to 



1 



-M 



(T) 



1 



(99) 



where the entries of EI are i.i.d. complex-Gaussian random variables with zero mean and variance 
Note that 



log det \ L + 



SNR 



(T)wVt(T) 



-er 'M 



is a continuous function with respect to the entries of the matrix 



> 



(100) 



1 



ritn^ + ntrij-SNRelrp 



(101) 



It therefore follows from Portmanteau's lemma [1271 that, as T — )• oo, the RHS of |(96)| can be 
lower-bounded by 

L-i r 

log det I I, 



lim -yU 



£=nt 



SNR 



(T)^t(r) 



n^n^ + n^nrSNRe^ 



1 



> 


L 


^< 


1= 


log 


I det 1 










> 


L 






log 


;det ( 






L ' 





SNR 



nt^r + nt^rSNRe^ 
SNR 



1 



(102) 
(103) 



ritrij. + rit^rSNRe^ 

where the last inequality follows from the lower bound log det (I + A) > log det A. Combining 
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(T03)] with [(96)] yields 



/g'^XSNR) = lim /f'^XSNR) (104) 



T^oo 



- ^^^(^tlogSNR-ntlog(nt2 + nt2SNRe2) + E[logdetHtf] -1^. (105) 

where in the last inequality we have used the assumption = n^. 

3) The Pre-Log: We next compute a lower bound on the pre-log by computing the limiting 



ratio of the RHS of (105) to logSNR as SNR tends to infinity. To this end, we first consider 



SNR/H(A)Ln, 



Since the integrand is bounded by 



SNR/^^(A)Lnt ^ 

» ^ smslw + L ^ 

it follows that < SNR < Ln^, which implies that 

logfnt^ + nt^SNRe^) 

lim \ ' = 0. (109) 

SNR^oo log SNR 

We next consider the term E [log det HHt] - 1. Note that by Jill Lemma A.2] and by the 
assumption rit = n^, we have 

rtt — 1 

E [logdet Iff] - 1 = nt log(l - e^) + ^ i){nt - 6) - 1 (110) 

6=0 

where is Euler's digamma function [|23l . Furthermore, since 

SNR///(A) + Lnt 
we have by the Dominated Convergence Theorem [fT9l that 

u,„ fi_ r_^Mfew^,,i.o (112) 

SNR^oo SNR^oo \ y_i/2 SNR/h(A) + L^t 
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SO log(l — e^) vanishes as the SNR tends to infinity. Combining |(1 12)| with |(1 10)] yields 



E log det HHt - 1 

SNR->oo log SNR 



(113) 



It thus follows from [(T05)| [(109)] and [(TT3)] that 
Hh. > (l - ^) 



minfrit, rir) ( 1 — 



L 



L < 



2A 



D 



where we have used that rit = = min(nt,nr). Note that the condition L < 
since otherwise |(97)| would not hold. This proves Theorem [T] 



(114) 
(115) 

2^ is necessary 



C. A Note on Input Distribution 

The pre-log in Theorem [T] is derived using codebooks whose entries are drawn i.i.d. from 
an rit-variate Gaussian distribution with zero mean and identity covariance matrix. However, 
Gaussian inputs are not necessary to achieve the pre-log |(25)[ In fact, |(25)| can be achieved by 
any i.i.d. inputs having density satisfying E[||X||^] < rit and |(26)| and |(27)[ namely. 



Px{x) < —e 



Tit 



lim 



logK 



0. 



(116) 
(117) 



SNR^oo log SNR 

Note that the fact that the inputs have a density implies that E[||X||^] > 0. To show that the 
conditions |(26)| and |(27)| suffice to achieve |(25)[ we follow the steps in Section IV-BI but with 
F(SNR) replaced by 

L-l 



F(SNR) = n. 



SNR 



(L - nt)nt 



EE 

i=nt 



(118) 



F ^ w^Wf IEH Sec. 5.6] and that and Xi are independent, we can 



We then upper-bound F(SNR) and k(9, SNR) as follows. Using that for any two matrices A and 
B we have ||AB|||, < ||Ap - '"^"^ "^--^ '^^ — ' ^^-^^ ^^^"^^ 
upper-bound F(SNR) by 

L-l 



F(SNR) <n,+ 



SNR 

(L - nt)nt 



EE 







2 " 


■ E 








F 







(119) 
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As for k{0, SNR), we have 





exp ( -Dk{m'] 



< 



PxKXk) exp - 

n 



Vk 



5^ H<->*. 



K 



IT' 



exp -||a3fe| 



n 



Vk 



SNR 



'fc ' 'k 



9 SNR 

n rit 



dxk 



H(T,Ht(T, 



-1 



Vk 



(120) 

(121) 
(122) 



Here |( 1 2 1)] follows from |(1 16)1 and |( 122)] follows by evaluating the integral as in [[121 App. A]. 
By following the steps used in Section [V-Bl and by choosing 

-1 



e 



rir 



+ SNRn,<^E [\\Xr] 
where el is given in |(95)[ we obtain from |(119)| and |(122) 

L-l 



(123) 



/r(SNR) > y 5^ E 



log det [„ + 



SNR 



^ L -rit 
L 



Utn, 



+ ntn,SNRe2_^E[||X||2] ' ' 



[l + \ogK). 



(124) 



Taking the limit as T tends to infinity, and repeating the steps used in Section [V-B[ yield 



jgmi(SNR) = ^lim /f'^^SNR) 



(125) 



> — ^ I E 



log det 



SNR 



ntn, + ntn,SNR e^E [\\X\\^] 



1-\okK 



(126) 



L-nt 



rit log SNR - nt log(nt2 + nt^SNR e^E [||Xf ] ) 

+ E [log det mS.^] - 1 - log 



(127) 

where we have again used the assumption rit = n^. We conclude by evaluating the limiting ratio 
of the RHS of [027)1 to log SNR as SNR tends to infinity. Using [(108)1 and that E[||Xf ] < m 



January 8, 2013 



DRAFT 



34 



yields 



lim 

SNR-!>oo 



log {n^ + n,^Sme^E[\\Xf]) 



logSNR 



0. 



(128) 



This in turn yields together with (113) that 



Jgmi S|\|R . 

lim . V^.r. >nt(l- — 
SNR^oo logSNR - V L 



(129) 



provided that 



lim 

SNR->oo logSNR 



0. 



(130) 



This concludes the proof. 



VL Proof of Theorem [2] 

In contrast to the proof of Theorem [U for the fading MAC, it is not sufficient to restrict 
ourselves to the case of Ut^i = nt,2 = ^r- For example, increasing beyond nt.i and nt,2 
does not increase the single-rate pre-logs Ur* and Hr*, but it does increase the pre-log of the 
achievable sum-rate n^j*^^. For the proof of Theorem |2] we therefore consider a general setup 

of rit^i, nt^2 and rij.. 

We derive the achievable pre-logs for the MAC case using a similar approach to the point- 
to-point case. We first consider the average error probability, averaged over the ensemble of 
i.i.d. Gaussian codebooks. Let Pg and Pe(mi, 7712) be the ensemble-average error probability and 
the ensemble-average error probability corresponding to message mi and 1112 being transmitted, 
respectively. Due to the symmetry of the codebook construction, Pg is equal to Pe(l, 1) and it 
therefore suffices to consider Pe(l, 1) to derive the achievable rates. Let £(m[,m2) denote the 
event that D(m[,m2) < D{1,1). Using the union bound, the error probability Pe(l, 1) can be 
upper-bounded as 



Pe(l,l) 



Pr<j IJ £im[,m'2) 



(131) 



<Pr<^ [j £{m[,l) \ +Pt\ |J £(1, m'^) i + Pr<^ |J \J £{m[,m'2 



(132) 
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We next analyze these probabilities corresponding to the error events {m[ ^ l,"^2 = 1), ijf^\ = 
l,m2 7^ 1) and {m[ ^ l,m2 ^ 1). Let the matrix IE^"^\ s = 1,2 with entries E^^J{r,t) be the 
estimation-error matrix in estimating fc, i.e., 



(133) 



To facilitate the analysis, we first generalize F(SNR) and Ts in the point-to-point case (cf. |(70)| 
and 1(7 1)| ) to the MAC case, i.e., 

L-l 



F(SNR) = rir 



Ts 



SNR 



E 



E 



(T) 



K 



(T) 



(134) 



/c = 0, 



- l,s = 1,2 



<5) (135) 



for some 5 > 0, with n' given in [(41)] and = {0, . . . , n' - 1} n P. Using F(SNR) and 

the typical set Ts, we continue by evaluating the GMI for each of the three probabilities on 



the RHS of |( 1 32)1 corresponding to the error events {m'l 7^ l,m2 = 1), (m^ = l,m2 7^ 1) and 
{m\ ^ l,m'2 ^ 1). 

1) Error Event {m[ ^ 1, rn'a = 1).' Following the steps as used in Section N-B\ to derive [(80)| 
we can upper-bound the ensemble-average error probability for the error event £{m[, 1), m[ ^ 1 
using Ts and its complement Tg" as 



< e 



"^?i.Pr|i.D(m;,l) <F(SNR) + 5 | (x;'(l), Y"', Hf s = 1, 2} G T^j 



m'l ^ 1. 



(136) 



Note that the second probability on the RHS of |(136)| vanishes as n tends to infinity, which can 
be shown along the lines of the proof of Lemma |2l 
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The GMI for User 1 gives the rate of exponential decay of the term 



Pt{-- D{m[, 1) < F(SNR) + 6 

n 



{ 1^"', ef )■"') , 3 = 1, 2} e (137) 



as n — !■ 00. The evaluation of the GMI for User 1 requires the expression of the log moment- 
generating function of the metric D{m[, 1) associated with an incorrect message m[ 7^ 1 — 
conditioned on the channel outputs, on m'2 = 1, and on the fading estimates — which we shall 
denote by K,,49,y^' ,x^' (1), Hf^'"', hf i.e., 



K,„(0,?;"',<(l),Hf)-',Hf)-' 



logE 



exp 



(138) 



where we have defined 



(139) 



Following the steps used in Section IV-BI to obtain |(84)| and |(85)[ it can be shown that 

(^,^"',a.r(i),Hr)-',Hr)-'^ 

J2 r-{yk-VsmH^^ix,4^^^^ \y,-VsmH^^^x,4r 



(T)ut(r) 



logdet l„^--SNRH ^,^H ^ 



(140) 



Then, following the steps used in Section IV-BI to derive |(86)f|(88)| we have that for all < 



1 



lim - ■ /s:i,„ {n9,y^',xl^'{l), Hf^'"', hlf 



L-l 



J2 (giA^^ SNR) - E [logdet - ^^SNReS^ellP) 



(141) 



^=nt,l+".t,2 



/ti(^,SNR) 



January 8, 2013 



DRAFT 



37 



almost surely, where |(141)| should be regarded as the definition of ki(9, SNR). Here we define 



^7i,,(^,SNR)^E 



(142) 



Following the derivation in [fTTI . [[T2l . we can then upper-bound the ensemble-average error 
probability (£{m[, 1), m'l ^ 1) for any 5' > as 



Pr I [j £{m[, 1) I < exp (nRi) exp (^-n (^/f7(SNR) - 6' 



+ ei{6',n) 



for some £:i(5',r2) satisfying 



lim ei{5',n) = 0, 5' > 0. 



(143) 



(144) 



On the RHS of [(T43)l /f J(SNR) denotes the GMI for User 1 as a function of SNR for a fixed 
T and is given by 



L - fit I - rit. 



^ ' sup (^F(SNR) -Ki(^,SNR)) ) . 



(145) 



L \e<o 

The pre-factor (L — rit^i — 711^2) /L equals the fraction of time used for data transmission. The 



bound 1(143)1 implies that for all rates below /f™'(SNR), decoding the message from User 1 
using the scheme described in Section |IV] has vanishing error probability as n tends to infinity. 



Combining [(134)1 and [(141)1 with [(145)1 yields 

L-l 

e<Q L 



/f-(SNR) = sup \ J2 



6 nr + SNRE 



E. 



(T) 

'2,e 



^7i/(^,SNR) 



+ E 



logdet -eSNRHiVH} 



(146) 



As the supremum (146) is difficult to evaluate, we next consider a lower bound on Jf™ (SNR). 
By noting gi^e{9,SMR) < for 6* < (which can be shown using the technique developed in 
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l|20l App. D]) and by choosing^ 



-1 



Tlr + rir (nt,l + ?2t,2) SNRc^. 



where 



max E 

s=l,2, 
r=l,...,nr, 
t=l,...,nt,s, 
=nt,i+"t,2,---,i-l 



(147) 



(148) 



we obtain a lower bound on Jf™(SNR) 



Jf7(SNR)>i g E 



i=nt.i+nt,2 



log det 



In. + 



SNRHiVH} 



(T)^t(r) 



V 



rir + rir (nt,i + nt,2) SNRe^^ 



(149) 



We continue by analyzing the RHS of |(149)| in the limit as the observation window T of the 
channel estimator tends to infinity. To this end, we note that, for L < the variance of the 



interpolation error E[\E^fg\r,t)\'^] tends to |(15)| (with SNR in |(15)| replaced by ntSNR)]^ so 



Um E 

T^oo 



EZ\r,t) 



1 - 



(150) 



_i/2SNR/h(A) + L 

" (T) ^ 

irrespective of s,i,r and t. Hence, irrespective of £, the estimate M\ / tends to Hi in distribution 
as T tends to infinity, so 

Hill 



rir + rir (nt,i + nt,2) SNRe 



2 

*,T 



rir + Uj. (nt,i + nt,2) SNRe^ 



(151) 



where the x ^t^i entries of Hi are i.i.d., circularly-symmetric, complex-Gaussian random 
variables with zero mean and variance (1 — e^). Using Portmanteau's lemma (as used in |(102)[ ), 



^As pointed in Section |V] this choice of 9 yields a good lower bound at high SNR. 

*Note the difference between the point-to-point channel model [(T)] and the MAC channel model [(40)1 
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we obtain that 



/rXSNR) = ^lim /f J(SNR) 



> 



L - Tltl - fit- 



\ 



logdet + 



\ 



- 1 



> mm{nr,nt^i, 



L — Tit I — mo 



Ur + nr (nt,i + nt,2) SNR 
log SNR - log [n, + rar(rat,i + ^^1,2) SNR e^) 



(152) 



(153) 



where 



^1 



logdet HiHI 
logdet 11 Hi 



1, < nt^i 
1, rir > riti. 



(154) 



(155) 



Here the last inequality follows by lower-bounding logdet (I + A) > logdet A. 



By evaluating the RHS of |(154)| in the same way as evaluating the RHS of |(105)| in Section 
IV-Bl we obtain a lower bound for the maximum achievable pre-log for User 1 as 



> min(ni.,nt,i) 



L < 



2A 



(156) 



D 



Here instead of assuming rit = ?T,t,i + ?T,t,2 = we have used a general setup of r2t,i, ^t,2 and 
nr. Note that the condition L < 1/(2 A/j) is necessary since otherwise [(15)] would not hold. This 
yields one boundary of the pre-log region presented in Theorem [2l 

2) Error Event {m[ = l,m'2 7^ 1).' This follows from the proof for the error event {m[ 7^ 
1, m'a = 1) by swapping User 1 with User 2. We thus have 

ntA + rito 



Ur* > min(ni.,nt.2) 1 



L < 



L J ~ 2Xd 
yielding the second boundary of the pre-log region presented in Theorem |2] 



(157) 



3) Error Event {m[ 7^ 1,^2 7^ 1)." The analysis on the achievable sum rate corresponding to 
the joint error event £(m[,m2), {m[ 7^ 1,^2 7^ 1) in the MAC case follows the same analysis 
as in the point-to-point case (Section [V-BI) . More specifically, the sum of the GMI Jf^rl^'^f^) 
can be viewed as the GMI of an x (nt,i + nt,2) -dimensional point-to-point MIMO channel 
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with fading matrix at time k, [Hi Hl2,fc], and fading estimate matrix at time k, H^"^-* , Hg^-* 
The maximum achievable sum-rate pre-log can therefore be obtained using the same approaches 
as in Section I^Bl but with arbitrary and rit = rit^i +^t,2- It can be shown that the maximum 
achievable sum-rate pre-log H^.^^ is lower-bounded by 

n^j^^ > min K, nt,i + nt.s) ( 1 - !hA±Ih:l] ^ L < (158) 



2A 



D 



On the RHS of |(158)| , the term mm {11^,12^^1 + nt^2) corresponds to the MIMO gain, which is 
given by the minimum number of receive and transmit antennas, and the term (l — !]M±I^^ 
corresponds to the fraction of time for data transmission, which changes for arbitrary number 
of transmit antennas in comparison to the proof of the point-to-point channel. This yields the 
third boundary of the pre-log region presented in Theorem 

VII. Conclusion 

In this paper we studied a communication scheme for MIMO fading channels that estimates 
the fading via transmission of pilot symbols at regular intervals, and feeds the fading estimates 
to the nearest neighbor decoder. Restricting ourselves to fading processes with a bandlimited 
power spectral density, we studied the information rates achievable with this scheme at high 
SNR. Specifically, we analyzed the achievable rate pre-log, defined as the limiting ratio of the 
achievable rate to the logarithm of the SNR in the limit as the SNR tends to infinity. 

We showed that, in order to obtain fading estimates whose variance vanishes as the SNR tends 
to infinity, the portion of time required for pilot transmission must be greater or equal to the 
number of transmit antennas times twice the bandwidth of the fading power spectral density. We 
demonstrated that, in this case, the nearest neighbor decoder achieves the capacity pre-log of 
the coherent fading channel times the fraction of time used for the transmission of data. Hence, 
the loss with respect to the coherent case is solely due to the transmission of pilots used to 
obtain accurate fading estimates. Our achievability bounds are tight in the sense that any scheme 
using as many pilots as our proposed scheme cannot achieve a higher pre-log using a nearest 
neighbor decoder. Furthermore, if the inverse of twice the bandwidth of the fading process is an 
integer, then, for MISO channels, our scheme achieves the capacity pre-log of the non-coherent 
fading channel derived by Koch and Lapidoth [9]. For non-coherent MIMO channels, our scheme 
achieves the best so far known lower bound on the capacity pre-log obtained by Etkin and Tse 
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IfTOl . Since the last result only yields a lower bound on the capacity pre-log of MIMO channels, 
there may exist other schemes achieving a better pre-log than our scheme. 

Appendix A 
Proof of Lemma [H 

1) By the orthogonality principle [|25l . we have that H^^"^ (r, t) and E"^^-* (r, t) are uncorrelated. 
Noting that the pilot symbols are unity, we can write |(67)| as 

iik\r^)= <^y{r,t)\J—-E^,{r,t)^Z^,{r)Y k^V. (159) 

k'=k-TL: \ V '^t J 

k'eV 

Since the processes {Hk{r,t), k E 1.} and {Zk{r), A; G Z} are zero-mean complex- 
Gaussian processes, we have from |(159)| and the orthogonality principle that Hj^^ (r, t) 



(T) 

and El {r,t) are independent zero-mean complex-Gaussian random variables. 
2) Recall from Section IV-AI that the time index k can be written as A; = jL + £. Then, for 
k E V, we have i = Ut, . . . , L — 1, and for k E V we have i = 0, . . . ,nt — 1. Since the 
pilot vectors are transmitted sequentially from pi to pn^, we have iov k eV that 

XjL+i = Pe+1, i = 0,...,nt-l (160) 

namely the (£+l)-th pilot vector, £ = 0, . . . , nt — 1 is used to estimate the fading coefficients 
from transmit antenna t. We next note that, in order to estimate Hk{r,t), there is no loss 
in optimality by considering only the outputs Yk'{r) for k' E V H {k — TL, . . . ,k + TL} 
satisfying 

k'modL = t-l. (161) 

Indeed, the channel outputs Yfc'(r), k' mod L ^ t — 1 correspond to Hk'{r,t'), t' ^ t, 
which are independent from Hk{r,t) since we have assumed that the fading processes 
corresponding to different transmit and receive antennas are independent. It follows that 
for the estimation at A; = jL + i, the coefficients ak'{r, t) that minimize the mean-squared 
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error depend only on L and i [fT4]| . The fading estimate [(67)1 can then be expressed as 

T-l 

HfLUr.t) = J2 <y-rLAr,m,_r)L+t-i{r) (162) 



T = -T 

T-l 



= ^ a-rLA^,t) y^^^H(^j_r)L+t-i{r,t) + Z(^j_r)L+t-iir)j (163) 
where for a given L and £ = Ut, . . . , L — 1, we have defined 

a-rL,£(r,t) = a(j_r)L+i-l('",^), T = -T, . . . ,T - 1. (164) 

Noting again that the Uj. ■ Ut processes {Hk{r, t), k el.} axe independent from each other 



and have the same law, we obtain the following results from |(163) 

a) For a given t, the time differences between the index of interest — {jL + i) — and the 
positions of pilots — ((j — t)L + t — 1) — do not depend on r. It thus follows that for 
a given t, the optimal coefficients a^^L/ir, t) are identical for all r = 1, . . . , rij. [1T4| . 
This implies that for a given t and i, the nj. processes 

{iH^lUl,t),Ef^l,il,t)), J G X},...,{{Hf^Un,,t),Ef^Un„t)), j E Z} 

are independent and have the same law. 

b) For a given r, the time differences between the index of interest — (jL + i) — and 
the position of pilots — ((j — t)L + t — 1, t = — T, . . . , T — 1) — depend on t. It thus 
follows from [fT4l that for a given r, the optimal coefficients ^(r, t) are generally 
different for t = 1, rif This implies that for a given r and i, the rit processes 

{(^j^^,(r,l),£;g^,(r,l)), J eZ},...,{(^]?;^,(r,nt),Eg^,(r,nO), J G Z} 

are independent but have different laws. 
3) We first note that {Hfc, G Z} is an ergodic Gaussian process, which implies that it is 
also a weakly mixing process [26] . (See [|27l for a definition of a weakly-mixing process.) 
Since {Z^, /c G Z} is an i.i.d. Gaussian process and independent from {M^, k G Z}, it 
follows from [|271 Prop. 1.6] that {(Elfc, Z^), k e Z} is jointly ergodic. 
We next evaluate the process {{M.^^\M.k, Zk), k G V}. Note that this process cannot be 
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expressed directly as a time-invariant function of {(H^., Z^), k G Z}. Indeed, by assuming 
k = jL + i, we can see from |( 163)] that the function to produce H^^^ from {{Mk, Z^), k G 



Z} depends on the time index k via i, for i = Ut, . . . , L — 1 (corresponding to time indices 
for data transmission). As such, to facilitate the analysis, we need to introduce a "dummy" 
matrix-valued process {A^^, A; G Z} where Afc_^ has x ut entries, and where its entry 
at row r and column t is given by 

Ak,e{r,t) = ^ a.riA^^t) \ \j -^Hk^rL~e+t-i{r,t) + Zk-rL-e+t-i{r) j ■ (165) 

Here the coefficients a^rL,e, t = —T, ■ ■ ■ ,T, have the same value as those in |(163)| for a 
given L and i. Consequently, {A^/, G Z} is a time-invariant function of {(Elfc, Z^), k G 
Z} that coincides with for k = jL + i. This in turn implies that {{Ak/, Mk, Zk), k G 
Z} is jointly weakly mixing. Furthermore, by the definition of weakly mixing [|26ll - [|28l . 
the process {(Ajx+£,<?, IHIjx+^, Zjx+^), j G Z} for any i = 0, . . . , L — 1 is also jointly 

" (T) 

weakly mixing. Since for k = jL + £, k ^ V, the matrix Ajx+£,^ is identical to H^xV^' 
it follows that the process {(IH[^.p_|_^, lH[jx+£, Zji^i), j G Z} for each £ = rit, . . . , L — 1 is 
jointly weakly mixing, which implies ergodicity. 

We finally evaluate the joint behavior of the two processes {(IHljp,^^, M.jL+e, Zji+i)-, j G Z} 
and {XjL+i, j G Z} for £ G {nt, . . . , Since {Xji+e, j G Z} for £ G {ut, . . . , L- 1} 

is i.i.d. and independent from {{&j2+£^^jL+e, Zji+e), j £ Z}, we have by [29, Lemma 
2] that the process 



{(Hg)+„ H,x+^, Z,x+^, X,x+^), J G Z}, £ G {nt, . . . , L - 1} 

is jointly ergodic. This proves Part 3) of Lemma [T] 
4) Note that the process {b'^P , k eV}isa function of {(Hfe, Zk),k e V}. Since {Zt, k e V} 
has zero mean and is independent from {(H^, Z^), k G V} and {X^, k G V}, it follows 
that for any of £ = Ut, . . . , L — 1 (which correspond to k E V) 

E [zjef =0. (166) 
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Appendix B 

Variance of the Interpolation Error for L > ttt- 
Recall that, as T tends to infinity, we have that, irrespective of j and r, the variance of the 

SNR|/z.,,_*+i(A)|^ 



interpolation error |(1 1)| namely 

4(t) = 1 

where 

L-l 



u=0 ^ 



1/2 SNR/i,o(A) + nt 
X-u 



-dX 



L 



1 , 1 

— < A < -. 

2 - - 2 



(167) 



(168) 



In order to analyze the behavior of e^(t) for L > we first express L as 



2A 



+ e 



(169) 



D 



for some e > 0. The variance of the interpolation error (167) can be lower- bounded as 



e^(t) - r ^^^^■°(^) dX 



V2 SNR([/z.,o(A)]'-|/L,.-m(A)|') 



> 



1/2 SNR/i,o(A)+nt 
V2 SNR([^,o(A)]'-|/L,,-t+i(A)|^) 



dX 



1/2 



SNR/z.,o(A) + nt 



dX 



where the inequality is because the first integral in |(170)| is non-negative. Let ^' = 
have that 

[/L,o(A)f-|/L,'(A)r 

L-l L-l 



(170) 

(171) 
t + We 



;/=0 i/'=o, 



L-l L-l 



H 



L 

X-v 



Ih 



L 

X-v' 



1 _ . e 



-i27r£' 



1 - COS 27rf 



1^=0 v'>v 

Since the summands are non-negative, it follows that 

[/l,o(A)]^ - |/L,HA)r > ^Jh f t1 fH^^~^ 



,v — V 



(172) 



(173) 



1 — cos 



27rf 



(174) 
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The RHS of |(171)| can thus be lower-bounded as 

^1/2 sm{[h,oi\)?-\fLrWf) 



1/2 



> 



SNR/z.,o(A)+nt 
2[l-cos(¥)] /SNR/,(A)/,(A_i) 



dX 



(175) 



L2 SNR/l,o(A) + rit 

where C denotes the interval in [—1/2, 1/2] where ///(f) and Jh (^x^) overlap. Note that, for 
L = 2^ — h E, this interval is of Lebesgue measure 



/i (£) = min(l, 2A£)£)- 



(176) 



By Fatou's lemma BOll . we obtain 



2 h- cos (2^)1 /■ SNR/^(A)/^(A_i) 
> ^ ^ ^ / liminf ^ ^ ' d\ 



2 [1- cos (¥11 /Mi)M¥)' 

A /l,o(A) 



L2 



-rfA. 



(177) 
(178) 



Since C is of positive Lebesgue measure, and since the integrand on the RHS of |(178)| is strictly 
positive, it follows from [1311 that 

h (!) In (¥) 



-f/A>0. 

Recall that £' = £ — t + 1. Thus, for £ = rit, . . . , L — 1, we have 

'27rf ~ 



(179) 



cos 



< 1. 



(180) 



Then, combining [(T80)] and [(T79)] with [(T78)| [(175)] and [(T7T)1 yields 



liminf e^(t) > 0. 

SNR^-oo 



(181) 
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